feat(pkg/p2p): reconnect on disconnected peers#3212
feat(pkg/p2p): reconnect on disconnected peers#3212julienrbrt wants to merge 10 commits intomainfrom
Conversation
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughAdds periodic DHT-based peer discovery and a background reconnect/maintenance loop to the P2P client, re-enables local Go module Changes
Sequence Diagram(s)sequenceDiagram
participant Client as P2P Client
participant Host as libp2p Host
participant DHT as DHT/Discovery
participant Notifee as Disconnect Notifee
participant Maintenance as Maintenance Loop
Client->>Host: startWithHost()
Client->>Notifee: register disconnect handler
Client->>Maintenance: start maintenance goroutine
Note over Host,Notifee: network event
Host-->>Notifee: peer disconnected
Notifee->>Maintenance: notify reconnectCh
Maintenance->>Maintenance: apply per-peer backoff
Maintenance->>Host: tryConnect(peer) (limited by connectSem)
Host-->>Maintenance: connection result
Maintenance->>DHT: periodic refreshPeerDiscovery()
DHT->>Host: re-advertise namespace
DHT-->>Maintenance: discovered peers
Maintenance->>Host: tryConnect(discovered peers)
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
The latest Buf updates on your PR. Results from workflow CI / buf-check (pull_request).
|
|
Claude finished @julienrbrt's task in 1m 32s —— View job Code Review
SummaryThe PR adds a background maintenance loop for periodic DHT re-discovery and a Critical Issues1. Goroutine leak in
|
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #3212 +/- ##
==========================================
- Coverage 61.43% 61.37% -0.07%
==========================================
Files 120 120
Lines 12504 12562 +58
==========================================
+ Hits 7682 7710 +28
- Misses 3960 3989 +29
- Partials 862 863 +1
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Actionable comments posted: 3
🧹 Nitpick comments (1)
apps/testapp/go.mod (1)
5-5: Keep thisreplacelocal to development.This ties the module to
../../.being present at build time. That works inside this checkout, but not when the submodule is built outside the monorepo layout. Prefer a localgo.workor an uncommitted dev override instead. Please verify this module is not expected to build outside the repository layout; ifapps/testappis intentionally monorepo-only, feel free to ignore this.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@apps/testapp/go.mod` at line 5, The committed go.mod contains a local replace directive ("replace github.com/evstack/ev-node => ../../.") that forces a monorepo layout; remove this replace from the tracked go.mod and instead keep it as an uncommitted developer-only override (or move the replacement into a go.work used only in development), or if this module is intended to be monorepo-only, document that restriction; update the repository by deleting the replace line from the committed go.mod and adding a dev-only go.work or .gitignore'd override as appropriate so builds outside the repo layout are not broken.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@apps/evm/go.mod`:
- Around line 5-8: The go.mod contains committed relative replace directives for
modules "github.com/evstack/ev-node" and
"github.com/evstack/ev-node/execution/evm" (the two replace (...) entries) which
will break consumers outside the monorepo; remove these relative replace lines
from apps/evm/go.mod before committing and, for local development only, move
them into a go.work file or use developer instructions to run `go work use` so
local overrides remain available but are not shipped in the module's go.mod.
In `@pkg/p2p/client.go`:
- Around line 343-355: The reconnect loop currently only receives signals from
reconnectCh (fed by Disconnected) so a failed host.Connect leaves the peer
without future triggers, and the current logic drops events that arrive during
the backoff window; update the reconnect handling (references: reconnectCh,
Disconnected, host.Connect, st.attempts, st.lastAttempt, reconnectCooldown,
maxReconnectCooldown) so that on a connect failure you re-schedule another retry
(e.g., push a retry event into reconnectCh or start a time.AfterFunc/goroutine
to re-enqueue after computed backoff) and when an event arrives inside the
backoff window do not drop it but schedule it to be processed after the
remaining backoff interval (use a timer to requeue into reconnectCh); apply the
same scheduling change to the other similar block (lines around the second
instance handling backoff) so transient outages cause repeated retries rather
than a single attempt.
- Line 199: The maintenance goroutine is started with the caller-supplied ctx so
timeouts/cancels on Start() prematurely stop reconnection/discovery; change this
so maintenance uses its own background context owned by the client and only
stops when Close() is called. Create a maintenance context via
context.WithCancel(context.Background()) (store the cancel func on the client
struct), pass that maintenance context into startConnectionMaintenance (and the
other maintenance/startDiscovery loops currently using the Start() ctx around
lines 317-320), and invoke the stored cancel in Close(); do not use the Start()
ctx for these long-lived loops.
---
Nitpick comments:
In `@apps/testapp/go.mod`:
- Line 5: The committed go.mod contains a local replace directive ("replace
github.com/evstack/ev-node => ../../.") that forces a monorepo layout; remove
this replace from the tracked go.mod and instead keep it as an uncommitted
developer-only override (or move the replacement into a go.work used only in
development), or if this module is intended to be monorepo-only, document that
restriction; update the repository by deleting the replace line from the
committed go.mod and adding a dev-only go.work or .gitignore'd override as
appropriate so builds outside the repo layout are not broken.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 1c692055-7514-4e34-ad88-0d6d76c89f7d
⛔ Files ignored due to path filters (2)
apps/evm/go.sumis excluded by!**/*.sumapps/testapp/go.sumis excluded by!**/*.sum
📒 Files selected for processing (6)
CHANGELOG.mdapps/evm/go.modapps/testapp/go.modpkg/p2p/client.gopkg/signer/aws/signer.gopkg/signer/gcp/signer.go
There was a problem hiding this comment.
Actionable comments posted: 2
♻️ Duplicate comments (1)
pkg/p2p/client.go (1)
351-352:⚠️ Potential issue | 🟠 MajorBackoff still turns a transient seed outage into a single retry.
At Line 351 the event is dropped if the peer is still inside its backoff window, and at Line 367 a failed
host.Connectonly logs. After that first failed dial there is no future trigger for that seed peer, so it can remain disconnected indefinitely.Also applies to: 367-369
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pkg/p2p/client.go` around lines 351 - 352, The code currently skips processing a seed peer if time.Now().Before(st.lastAttempt.Add(backoff)) and only logs on host.Connect failure, which means a transient outage may never be retried; change this so that when a peer is within backoff you requeue/schedule it for a future attempt instead of dropping the event (use a timer or push it back onto the attempt queue and keep state in st), and when host.Connect(peer) fails update st.lastAttempt and also schedule a follow-up retry (rather than only logging) so the peer will be retried after backoff; make these changes around the references to st.lastAttempt.Add(backoff) and the host.Connect call to ensure failed dials trigger future retries.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@pkg/p2p/client.go`:
- Around line 317-321: startConnectionMaintenance currently only handles future
Disconnected events so seed peers that are already disconnected never get
enqueued; before the maintenance select loop begins (inside the maintenanceWg.Go
started by startConnectionMaintenance), iterate c.seedPeers and for each peer
that is not currently connected (use existing connection state check used
elsewhere in the file), send that peer into c.reconnectCh once so they will be
retried; ensure sends won't block the worker (use a non-blocking select with
default or spawn a short goroutine per send) and rely on
refreshPeerDiscovery/c.getNamespace as before for discovery updates.
- Around line 388-389: The current loop repeatedly calls c.advertise(ctx) which
spawns a background re-advertisement goroutine each time (discutil.Advertise
runs until ctx.Done()), causing a goroutine leak; fix by invoking
c.advertise(ctx) exactly once from the maintenance context (e.g., start it when
maintenanceCtx is created) and change the ticker loop to only call peer
discovery (e.g., c.FindPeers or the method that refreshes discovery) on each
tick, or alternatively modify c.advertise to provide a single-shot advertise API
and call that from the ticker; locate and update the code paths referencing
c.advertise(ctx) and the ticker loop so no repeated long-lived advertise
goroutines are started.
---
Duplicate comments:
In `@pkg/p2p/client.go`:
- Around line 351-352: The code currently skips processing a seed peer if
time.Now().Before(st.lastAttempt.Add(backoff)) and only logs on host.Connect
failure, which means a transient outage may never be retried; change this so
that when a peer is within backoff you requeue/schedule it for a future attempt
instead of dropping the event (use a timer or push it back onto the attempt
queue and keep state in st), and when host.Connect(peer) fails update
st.lastAttempt and also schedule a follow-up retry (rather than only logging) so
the peer will be retried after backoff; make these changes around the references
to st.lastAttempt.Add(backoff) and the host.Connect call to ensure failed dials
trigger future retries.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
| func (c *Client) startConnectionMaintenance() { | ||
| ctx, cancel := context.WithCancel(context.Background()) | ||
| c.maintenanceCancel = cancel | ||
|
|
||
| c.maintenanceWg.Go(func() { |
There was a problem hiding this comment.
Queue currently disconnected seed peers when maintenance starts.
This worker only reacts to future Disconnected events. A seed peer that never connected during startup, or dropped before the notifee was registered, never enters reconnectCh, and refreshPeerDiscovery only searches c.getNamespace(). Seed any currently disconnected c.seedPeers once before the select loop begins.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@pkg/p2p/client.go` around lines 317 - 321, startConnectionMaintenance
currently only handles future Disconnected events so seed peers that are already
disconnected never get enqueued; before the maintenance select loop begins
(inside the maintenanceWg.Go started by startConnectionMaintenance), iterate
c.seedPeers and for each peer that is not currently connected (use existing
connection state check used elsewhere in the file), send that peer into
c.reconnectCh once so they will be retried; ensure sends won't block the worker
(use a non-blocking select with default or spawn a short goroutine per send) and
rely on refreshPeerDiscovery/c.getNamespace as before for discovery updates.
| _ = c.advertise(ctx) | ||
|
|
There was a problem hiding this comment.
🧩 Analysis chain
🌐 Web query:
For github.com/libp2p/go-libp2p v0.48.0, what does p2p/discovery/util.Advertise do? Does it start a background re-advertisement loop tied to ctx.Done(), or is it a single synchronous advertise call?
💡 Result:
In github.com/libp2p/go-libp2p/p2p/discovery/util v0.48.0, util.Advertise(ctx, a, ns, ...) starts a background goroutine that re-advertises in a loop until ctx.Done() is closed—it is not a single synchronous advertise call. (raw.githubusercontent.com)
Behavior details:
- Calls
a.Advertise(ctx, ns, opts...)repeatedly, receiving attl. (raw.githubusercontent.com) - Waits
7*ttl/8before the next advertise (or exits early onctx.Done()). (raw.githubusercontent.com) - On advertise error, logs, then (if context isn’t canceled) waits 2 minutes and retries (or exits on
ctx.Done()). (raw.githubusercontent.com)
Citations:
- 1: https://raw.githubusercontent.com/libp2p/go-libp2p/v0.48.0/p2p/discovery/util/util.go
- 2: https://raw.githubusercontent.com/libp2p/go-libp2p/v0.48.0/p2p/discovery/util/util.go
- 3: https://raw.githubusercontent.com/libp2p/go-libp2p/v0.48.0/p2p/discovery/util/util.go
- 4: https://raw.githubusercontent.com/libp2p/go-libp2p/v0.48.0/p2p/discovery/util/util.go
Fix goroutine leak in advertise refresh loop.
Calling c.advertise(ctx) every 5 minutes at line 388 spawns a new background goroutine with each call. In libp2p v0.48.0, discutil.Advertise starts a re-advertisement loop that runs until ctx.Done() — it is not a single synchronous call. Stacking these goroutines will leak resources for the client's lifetime.
Start c.advertise(ctx) once from the maintenance context and let the ticker only refresh peer discovery via FindPeers, or switch to a single-shot advertise call that does not spawn a background loop.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@pkg/p2p/client.go` around lines 388 - 389, The current loop repeatedly calls
c.advertise(ctx) which spawns a background re-advertisement goroutine each time
(discutil.Advertise runs until ctx.Done()), causing a goroutine leak; fix by
invoking c.advertise(ctx) exactly once from the maintenance context (e.g., start
it when maintenanceCtx is created) and change the ticker loop to only call peer
discovery (e.g., c.FindPeers or the method that refreshes discovery) on each
tick, or alternatively modify c.advertise to provide a single-shot advertise API
and call that from the ticker; locate and update the code paths referencing
c.advertise(ctx) and the ticker loop so no repeated long-lived advertise
goroutines are started.
There was a problem hiding this comment.
Actionable comments posted: 1
♻️ Duplicate comments (2)
pkg/p2p/client.go (2)
317-320:⚠️ Potential issue | 🟠 MajorStill unresolved: keep only one advertise worker alive.
refreshPeerDiscovery()runs on a ticker, butadvertise()already starts a long-lived re-advertisement worker. Calling it again here stacks extra advertisers for the lifetime of the client. Start one advertiser from the maintenance context and let the ticker only refresh discovery. If maintenance owns the long-lived advertiser, remove the startuppeerDiscovery()call too so there is still only one loop total.For the go-libp2p version used by this repository, does `p2p/discovery/util.Advertise` start a background re-advertisement loop until `ctx.Done()`, or is it a one-shot advertise call?Based on learnings: Applies to **/*.go : Be mindful of goroutine leaks in Go code.
Also applies to: 399-400
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pkg/p2p/client.go` around lines 317 - 320, startConnectionMaintenance currently starts a maintenance context but also leads to multiple concurrent advertisers because refreshPeerDiscovery calls advertise repeatedly while advertise itself spawns a long-lived re-advertisement worker; to fix, ensure only one long-lived advertise worker is created from startConnectionMaintenance using the maintenance ctx (call advertise(ctx) once there), remove any other startup call to peerDiscovery() so you don't spawn a separate loop, and change refreshPeerDiscovery to only refresh discovery state (not call advertise) so it no longer creates additional advertisers; reference functions: startConnectionMaintenance, refreshPeerDiscovery, advertise, peerDiscovery and the maintenanceCancel/maintenance context.
321-324:⚠️ Potential issue | 🟠 MajorStill unresolved: seed the reconnect queue with already-disconnected seed peers.
This worker still only reacts to future
Disconnectedcallbacks. A configured seed peer that never connected during startup, or dropped beforeNotify()was registered, never reachesreconnectCh, so the reconnect path never runs for it.🔧 Minimal fix
c.maintenanceWg.Go(func() { + for _, sp := range c.seedPeers { + if c.isConnected(sp.ID) { + continue + } + select { + case c.reconnectCh <- sp.ID: + default: + } + } + discoveryTicker := time.NewTicker(peerDiscoveryInterval) defer discoveryTicker.Stop()🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pkg/p2p/client.go` around lines 321 - 324, The worker registered in c.maintenanceWg.Go only enqueues peers on future Disconnected callbacks, so configured seed peers that were never connected or disconnected before Notify was registered are never added to c.reconnectCh; fix by seeding the reconnect queue at startup: after creating discoveryTicker in the maintenance goroutine (inside the same block started by maintenanceWg.Go), iterate configured seed peers (the seeds list used by the peer discovery logic), check each peer's current connectedness via the host/peerstore/Network API (e.g., Host.Network().Connectedness / Peerstore.Connectedness), and send any peers that are not currently connected into c.reconnectCh (respecting non-blocking semantics or select with ctx.Done() to avoid blocking). Ensure this happens once before the ticker loop so the existing reconnect worker will process already-disconnected seed peers.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@pkg/p2p/client.go`:
- Around line 206-209: Close() currently cancels maintenanceCancel and waits
only for maintenanceWg but discovery and dial goroutines (spawned by the
maintenance worker) are not time-boxed or tracked, so add a child context with a
short timeout for the periodic discovery/refresh and for each dial/Connect call
(derive via maintenanceCtx := context.WithTimeout(maintenanceCancelCtx,
shortDuration)), ensure every spawned dial worker is registered with the same
maintenanceWg (or use an errgroup bound to maintenanceCtx) so they are waited
on, pass the child ctx into the dial/Connect/DHT calls so they can be cancelled,
and guarantee connectSem is released in a defer inside each worker even on
timeout/cancellation; also ensure maintenanceCancel cancels the parent ctx so
all child timeouts abort during Close().
---
Duplicate comments:
In `@pkg/p2p/client.go`:
- Around line 317-320: startConnectionMaintenance currently starts a maintenance
context but also leads to multiple concurrent advertisers because
refreshPeerDiscovery calls advertise repeatedly while advertise itself spawns a
long-lived re-advertisement worker; to fix, ensure only one long-lived advertise
worker is created from startConnectionMaintenance using the maintenance ctx
(call advertise(ctx) once there), remove any other startup call to
peerDiscovery() so you don't spawn a separate loop, and change
refreshPeerDiscovery to only refresh discovery state (not call advertise) so it
no longer creates additional advertisers; reference functions:
startConnectionMaintenance, refreshPeerDiscovery, advertise, peerDiscovery and
the maintenanceCancel/maintenance context.
- Around line 321-324: The worker registered in c.maintenanceWg.Go only enqueues
peers on future Disconnected callbacks, so configured seed peers that were never
connected or disconnected before Notify was registered are never added to
c.reconnectCh; fix by seeding the reconnect queue at startup: after creating
discoveryTicker in the maintenance goroutine (inside the same block started by
maintenanceWg.Go), iterate configured seed peers (the seeds list used by the
peer discovery logic), check each peer's current connectedness via the
host/peerstore/Network API (e.g., Host.Network().Connectedness /
Peerstore.Connectedness), and send any peers that are not currently connected
into c.reconnectCh (respecting non-blocking semantics or select with ctx.Done()
to avoid blocking). Ensure this happens once before the ticker loop so the
existing reconnect worker will process already-disconnected seed peers.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
| if c.maintenanceCancel != nil { | ||
| c.maintenanceCancel() | ||
| } | ||
| c.maintenanceWg.Wait() |
There was a problem hiding this comment.
Track and time-box the maintenance dials.
Close() only waits for the dispatcher goroutine. Lines 373-381 and 413-416 spawn dial workers outside maintenanceWg, while Line 385 runs discovery inline on the only maintenance worker. A slow discovery/dial can block reconnect handling, pin connectSem, or still be running when teardown closes the host and DHT. Give the refresh and each dial a short child timeout, and track the spawned workers in the same wait group/errgroup.
As per coding guidelines "Use context.Context for cancellation in Go" and "Be mindful of goroutine leaks in Go code".
Also applies to: 373-381, 384-385, 401-419
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@pkg/p2p/client.go` around lines 206 - 209, Close() currently cancels
maintenanceCancel and waits only for maintenanceWg but discovery and dial
goroutines (spawned by the maintenance worker) are not time-boxed or tracked, so
add a child context with a short timeout for the periodic discovery/refresh and
for each dial/Connect call (derive via maintenanceCtx :=
context.WithTimeout(maintenanceCancelCtx, shortDuration)), ensure every spawned
dial worker is registered with the same maintenanceWg (or use an errgroup bound
to maintenanceCtx) so they are waited on, pass the child ctx into the
dial/Connect/DHT calls so they can be cancelled, and guarantee connectSem is
released in a defer inside each worker even on timeout/cancellation; also ensure
maintenanceCancel cancels the parent ctx so all child timeouts abort during
Close().
There was a problem hiding this comment.
♻️ Duplicate comments (2)
pkg/p2p/client.go (2)
312-317:⚠️ Potential issue | 🟠 MajorDial workers are not tracked by maintenance waitgroup and are not time-boxed.
The goroutines spawned at Line 314 are outside
maintenanceWgand use the parent maintenance context without per-dial timeout, soClose()can wait for the loop yet still race with in-flight dials. Track these workers inmaintenanceWgand wrap each dial with a shortcontext.WithTimeout.🔧 Focused fix
select { case c.connectSem <- struct{}{}: - go func(peer peer.AddrInfo) { - defer func() { <-c.connectSem }() - c.tryConnect(ctx, peer) - }(p) + peerInfo := p + c.maintenanceWg.Go(func() { + defer func() { <-c.connectSem }() + dialCtx, cancel := context.WithTimeout(ctx, 15*time.Second) + defer cancel() + c.tryConnect(dialCtx, peerInfo) + }) case <-ctx.Done(): return }Based on learnings "Applies to **/.go : Use context.Context for cancellation in Go" and "Applies to **/.go : Be mindful of goroutine leaks in Go code".
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pkg/p2p/client.go` around lines 312 - 317, The spawned dial goroutines started when sending to connectSem are not tracked by maintenanceWg and use the parent maintenance context without a per-dial timeout; modify the goroutine that calls tryConnect so you increment maintenanceWg (maintenanceWg.Add(1)) before launching, defer maintenanceWg.Done() inside the goroutine, and wrap the parent maintenance context with a short timeout via context.WithTimeout (e.g., ctxDial, cancel := context.WithTimeout(ctx, <reasonable-duration>); defer cancel()) and pass ctxDial into tryConnect; keep the existing defer that releases connectSem so the semaphore is still cleared even on timeout or cancellation.
300-301:⚠️ Potential issue | 🟠 Major
c.advertise(ctx)in the ticker path can stack long-lived advertise loops.Calling
c.advertise(ctx)on every refresh tick (Line 300) is still risky ifdiscutil.Advertiseis loop-based for the lifetime ofctx; this can accumulate goroutines over time. Move advertise startup to a single call when maintenance starts, and keep the ticker for peer refresh only.For github.com/libp2p/go-libp2p v0.48.0, does p2p/discovery/util.Advertise start a background re-advertisement loop until ctx.Done(), or is it a one-shot synchronous advertise call?As per coding guidelines "Be mindful of goroutine leaks in Go code".
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pkg/p2p/client.go` around lines 300 - 301, The call to c.advertise(ctx) inside the refresh ticker can spawn long-lived advertise loops repeatedly if discutil.Advertise runs until ctx.Done(); remove the per-tick call and instead start advertising once when maintenance begins: call c.advertise(ctx) (or spawn a single goroutine that calls discutil.Advertise with the maintenance context) in the maintenance startup path, keep the ticker handling only peer refresh logic, and ensure the advertise goroutine is tied to the same cancellable context (or returns on a done channel) so it is stopped when maintenance stops to avoid goroutine leaks; update any start/stop or runMaintenance functions to start that single advertise instance and to cancel it on shutdown.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Duplicate comments:
In `@pkg/p2p/client.go`:
- Around line 312-317: The spawned dial goroutines started when sending to
connectSem are not tracked by maintenanceWg and use the parent maintenance
context without a per-dial timeout; modify the goroutine that calls tryConnect
so you increment maintenanceWg (maintenanceWg.Add(1)) before launching, defer
maintenanceWg.Done() inside the goroutine, and wrap the parent maintenance
context with a short timeout via context.WithTimeout (e.g., ctxDial, cancel :=
context.WithTimeout(ctx, <reasonable-duration>); defer cancel()) and pass
ctxDial into tryConnect; keep the existing defer that releases connectSem so the
semaphore is still cleared even on timeout or cancellation.
- Around line 300-301: The call to c.advertise(ctx) inside the refresh ticker
can spawn long-lived advertise loops repeatedly if discutil.Advertise runs until
ctx.Done(); remove the per-tick call and instead start advertising once when
maintenance begins: call c.advertise(ctx) (or spawn a single goroutine that
calls discutil.Advertise with the maintenance context) in the maintenance
startup path, keep the ticker handling only peer refresh logic, and ensure the
advertise goroutine is tied to the same cancellable context (or returns on a
done channel) so it is stopped when maintenance stops to avoid goroutine leaks;
update any start/stop or runMaintenance functions to start that single advertise
instance and to cancel it on shutdown.
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
pkg/p2p/client.go (1)
190-191: Consider passing context parameter to satisfy linter, even if unused.The golangci-lint
contextcheckrule flags this becausestartConnectionMaintenancedoesn't receive the context fromstartWithHost. While the design is correct (maintenance usescontext.Background()to decouple from caller), you could either:
- Disable contextcheck for this line with
//nolint:contextcheck- Accept a
context.Contextparameter but explicitly ignore it, documenting whyOption 1: Disable linter for this line
c.host.Network().Notify(c.newDisconnectNotifee()) - c.startConnectionMaintenance() + c.startConnectionMaintenance() //nolint:contextcheck // maintenance uses independent background context🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pkg/p2p/client.go` around lines 190 - 191, The linter flags startConnectionMaintenance for not accepting the caller context; update startConnectionMaintenance to accept a context.Context parameter (e.g., func (c *Client) startConnectionMaintenance(ctx context.Context)) and change the call in startWithHost from c.startConnectionMaintenance() to c.startConnectionMaintenance(ctx). Inside startConnectionMaintenance, explicitly ignore the passed-in ctx (e.g., _ = ctx) and add a short comment explaining you intentionally use context.Background() internally to decouple maintenance from the caller; this satisfies contextcheck while preserving the original design.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@pkg/p2p/client.go`:
- Around line 308-325: The code calls maintenanceWg.Go(...) but maintenanceWg is
a sync.WaitGroup which has no Go method; replace that call with the manual
WaitGroup goroutine pattern in startConnectionMaintenance: call
maintenanceWg.Add(1) before launching a goroutine, then start a plain go func()
that defers maintenanceWg.Done() and contains the existing ticker loop/body
(including ctx, cancel, discoveryTicker). This preserves the existing behavior
without changing maintenanceWg's type; alternatively, if you prefer to keep the
Go() usage, change the maintenanceWg field to an errgroup.Group and update
callers, but the minimal fix is to use Add(1) + go func() + defer Done() around
the current closure.
---
Nitpick comments:
In `@pkg/p2p/client.go`:
- Around line 190-191: The linter flags startConnectionMaintenance for not
accepting the caller context; update startConnectionMaintenance to accept a
context.Context parameter (e.g., func (c *Client) startConnectionMaintenance(ctx
context.Context)) and change the call in startWithHost from
c.startConnectionMaintenance() to c.startConnectionMaintenance(ctx). Inside
startConnectionMaintenance, explicitly ignore the passed-in ctx (e.g., _ = ctx)
and add a short comment explaining you intentionally use context.Background()
internally to decouple maintenance from the caller; this satisfies contextcheck
while preserving the original design.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
| func (c *Client) startConnectionMaintenance() { | ||
| ctx, cancel := context.WithCancel(context.Background()) | ||
| c.maintenanceCancel = cancel | ||
|
|
||
| c.maintenanceWg.Go(func() { | ||
| discoveryTicker := time.NewTicker(peerDiscoveryInterval) | ||
| defer discoveryTicker.Stop() | ||
|
|
||
| for { | ||
| select { | ||
| case <-ctx.Done(): | ||
| return | ||
| case <-discoveryTicker.C: | ||
| c.refreshPeerDiscovery(ctx) | ||
| } | ||
| } | ||
| }) | ||
| } |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Verify sync.WaitGroup usage and check if errgroup is used elsewhere
rg "sync\.WaitGroup" --type go -l | head -5
rg "errgroup\.Group" --type go -l | head -5Repository: evstack/ev-node
Length of output: 235
🏁 Script executed:
#!/bin/bash
# Find the Client struct definition and maintenanceWg field type
ast-grep --pattern $'type Client struct {
$$$
maintenanceWg $_
$$$
}'Repository: evstack/ev-node
Length of output: 944
🏁 Script executed:
#!/bin/bash
# Check imports in pkg/p2p/client.go
head -30 pkg/p2p/client.go | cat -nRepository: evstack/ev-node
Length of output: 1132
🏁 Script executed:
#!/bin/bash
# Broader search for maintenanceWg declaration
rg "maintenanceWg\s*[:\=]" --type go -A 1 -B 1 pkg/p2p/Repository: evstack/ev-node
Length of output: 164
🏁 Script executed:
#!/bin/bash
# Double-check: search for any other imports or type definitions that might affect maintenanceWg
rg "maintenanceWg" pkg/p2p/client.go | head -10Repository: evstack/ev-node
Length of output: 145
sync.WaitGroup does not have a Go() method — this code will not compile.
The maintenanceWg field is declared as sync.WaitGroup (line 70), which only provides Add(), Done(), and Wait() methods. Either use golang.org/x/sync/errgroup.Group (which provides a Go() method) or use the manual pattern:
Fix using manual WaitGroup pattern
func (c *Client) startConnectionMaintenance() {
ctx, cancel := context.WithCancel(context.Background())
c.maintenanceCancel = cancel
- c.maintenanceWg.Go(func() {
+ c.maintenanceWg.Add(1)
+ go func() {
+ defer c.maintenanceWg.Done()
discoveryTicker := time.NewTicker(peerDiscoveryInterval)
defer discoveryTicker.Stop()
for {
select {
case <-ctx.Done():
return
case <-discoveryTicker.C:
c.refreshPeerDiscovery(ctx)
}
}
- })
+ }()
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| func (c *Client) startConnectionMaintenance() { | |
| ctx, cancel := context.WithCancel(context.Background()) | |
| c.maintenanceCancel = cancel | |
| c.maintenanceWg.Go(func() { | |
| discoveryTicker := time.NewTicker(peerDiscoveryInterval) | |
| defer discoveryTicker.Stop() | |
| for { | |
| select { | |
| case <-ctx.Done(): | |
| return | |
| case <-discoveryTicker.C: | |
| c.refreshPeerDiscovery(ctx) | |
| } | |
| } | |
| }) | |
| } | |
| func (c *Client) startConnectionMaintenance() { | |
| ctx, cancel := context.WithCancel(context.Background()) | |
| c.maintenanceCancel = cancel | |
| c.maintenanceWg.Add(1) | |
| go func() { | |
| defer c.maintenanceWg.Done() | |
| discoveryTicker := time.NewTicker(peerDiscoveryInterval) | |
| defer discoveryTicker.Stop() | |
| for { | |
| select { | |
| case <-ctx.Done(): | |
| return | |
| case <-discoveryTicker.C: | |
| c.refreshPeerDiscovery(ctx) | |
| } | |
| } | |
| }() | |
| } |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@pkg/p2p/client.go` around lines 308 - 325, The code calls
maintenanceWg.Go(...) but maintenanceWg is a sync.WaitGroup which has no Go
method; replace that call with the manual WaitGroup goroutine pattern in
startConnectionMaintenance: call maintenanceWg.Add(1) before launching a
goroutine, then start a plain go func() that defers maintenanceWg.Done() and
contains the existing ticker loop/body (including ctx, cancel, discoveryTicker).
This preserves the existing behavior without changing maintenanceWg's type;
alternatively, if you prefer to keep the Go() usage, change the maintenanceWg
field to an errgroup.Group and update callers, but the minimal fix is to use
Add(1) + go func() + defer Done() around the current closure.
Attempt to reconnect on peers when having connectivity issue.
PR made by AI. Still testing.
Summary by CodeRabbit